Search CORE

169 research outputs found

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

Author: He Pinjia
He Shilin
Lyu Michael R.
Zhu Jieming
Publication venue
Publication date: 14/08/2020
Field of study

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717

arXiv.org e-Print Archive

ROME: Testing Image Captioning Systems via Recursive Object Melting

Author: He Pinjia
He Shilin
Li Jiaqi
Yang Yixing
Yu Boxi
Zhong Zhiqing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/06/2023
Field of study

Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.Comment: Accepted by ISSTA 202

arXiv.org e-Print Archive

Cu2O@PNIPAM core–shell microgels as novel inkjet materials for the preparation of CuO hollow porous nanocubes gas sensing layers

Author: Gao Haitao
Jia He
Kneer Janosch
Lin Xianxhong
Lu Yan
Mei Shilin
Palzer Stefan
Ran Qidi
Wang Fuxian
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2018
Field of study

There has been long-standing interest in developing metal oxide-based sensors with high sensitivity, selectivity, fast response and low material consumption. Here we report for the first time the utilization of Cu2O@PNIPAM core–shell microgels with a nanocube-shaped core structure for construction of novel CuO gas sensing layers. The hybrid microgels show significant improvement in colloidal stability as compared to native Cu2O nanocubes. Consequently, a homogeneous thin film of Cu2O@PNIPAM nanoparticles can be engineered in a quite low solid content (1.5 wt%) by inkjet printing of the dispersion at an optimized viscosity and surface tension. Most importantly, thermal treatment of the Cu2O@PNIPAM microgels forms porous CuO nanocubes, which show much faster response to relevant trace NO2 gases than sensors produced from bare Cu2O nanocubes. This outcome is due to the fact that the PNIPAM shell can successfully hinder the aggregation of CuO nanoparticles during pyrolysis, which enables full utilization of the sensor layers and better access of the gas to active sites. These results point out great potential of such an innovative system as gas sensors with low cost, fast response and high sensitivitH. J. gratefully acknowledges financial support of the CSC scholarship. S. P. acknowledges funding from the Community of Madrid under grant number 2016-T1/AMB-1695

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

HZB Repository

Biblos-e Archivo

ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

Author: Chen Yuhang
Ding Ruomeng
He Shilin
Li Bowen
Lin Qingwei
Liu Yudong
Ma Minghua
Rajmohan Saravan
Zhang Chaoyun
Zhang Dongmei
Publication venue
Publication date: 14/11/2023
Field of study

Anomaly detection in multivariate time series data is of paramount importance for ensuring the efficient operation of large-scale systems across diverse domains. However, accurately detecting anomalies in such data poses significant challenges. Existing approaches, including forecasting and reconstruction-based methods, struggle to address these challenges effectively. To overcome these limitations, we propose a novel anomaly detection framework named ImDiffusion, which combines time series imputation and diffusion models to achieve accurate and robust anomaly detection. The imputation-based approach employed by ImDiffusion leverages the information from neighboring values in the time series, enabling precise modeling of temporal and inter-correlated dependencies, reducing uncertainty in the data, thereby enhancing the robustness of the anomaly detection process. ImDiffusion further leverages diffusion models as time series imputers to accurately capturing complex dependencies. We leverage the step-by-step denoised outputs generated during the inference process to serve as valuable signals for anomaly prediction, resulting in improved accuracy and robustness of the detection process. We evaluate the performance of ImDiffusion via extensive experiments on benchmark datasets. The results demonstrate that our proposed framework significantly outperforms state-of-the-art approaches in terms of detection accuracy and timeliness. ImDiffusion is further integrated into the real production system in Microsoft and observe a remarkable 11.4% increase in detection F1 score compared to the legacy approach. To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.Comment: To appear in VLDB 2024.Code: https://github.com/17000cyh/IMDiffusion.gi

arXiv.org e-Print Archive